{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dynamic Tensors\n",
    "\n",
    "In this notebook, we will see how to handle data of variable shape and sizes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# first we install hub\n",
    "# runtime enviroment\n",
    "\n",
    "!pip install hub"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note**: Restart the colab runtime as few packages has been updated or you may get error (<font color=\"red\">FileNotFoundError</font>)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hub.schema import Primitive, Audio, ClassLabel\n",
    "from hub import transform, schema\n",
    "\n",
    "import librosa\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from tqdm import tqdm\n",
    "\n",
    "from glob import glob\n",
    "from time import time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What if our dataset contains data with varying sizes?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "n_samples: 32000\n",
      "n_samples: 115542\n",
      "n_samples: 176400\n",
      "n_samples: 176400\n",
      "n_samples: 176400\n",
      "n_samples: 176400\n",
      "n_samples: 176400\n",
      "n_samples: 176400\n",
      "n_samples: 192000\n",
      "n_samples: 176400\n",
      "n_samples: 176400\n",
      "n_samples: 176400\n",
      "n_samples: 64589\n",
      "n_samples: 23373\n"
     ]
    }
   ],
   "source": [
    "fnames = glob(\"./Data/audio/*\")\n",
    "\n",
    "for fname in fnames:\n",
    "    print(\"n_samples:\", librosa.load(fname, sr=None)[0].shape[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (A) Defining a \"Dynamic\" Schema\n",
    "A schema is a python `dicts` that contains metadata about our dataset. \n",
    "\n",
    "In this example, we tell Hub that our files are variable in duration by passing in `shape=(None,)`. In return, we tell Hub that our files could be as large as 192,000 samples with `max_shape=(192000,)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_schema = {\n",
    "    \"wav\": Audio(shape=(None,), max_shape=(192000,), file_format=\"wav\")\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (B) Defining Transforms\n",
    "Transforms for dynamic tensors look the seame as transforms for static tensors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "@transform(schema=my_schema)\n",
    "def load_transform(sample):\n",
    "    \n",
    "    audio = librosa.load(sample, sr=None)[0]\n",
    "    \n",
    "    return {\n",
    "        \"wav\": audio\n",
    "    }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "hub.compute.transform.Transform"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds = load_transform(fnames) # returns a transform object\n",
    "type(ds)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (C) Finally, Execution!\n",
    "Hub lazily executes, so nothing happens until we invoke `store`. By invoking `store`, we apply `load_transform` to our dataset and push everything."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/mynameisvinn/anaconda3/lib/python3.8/site-packages/zarr/creation.py:210: UserWarning: ignoring keyword argument 'mode'\n",
      "  warn('ignoring keyword argument %r' % k)\n",
      "Computing the transormation: 100%|██████████| 14.0/14.0 [00:00<00:00, 20.9 items/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Elapsed time: 3.316145896911621\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "start = time()\n",
    "\n",
    "tag = \"mynameisvinn/vibrations\"\n",
    "ds2 = ds.store(tag)\n",
    "type(ds2)\n",
    "\n",
    "end = time()\n",
    "print(\"Elapsed time:\", end - start)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
